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Abstract 

Background: Lymph node metastasis (LNM) in gastric cancer is a very important prognostic factor affecting long- 
term survival. Currently, several common imaging techniques are used to evaluate the lymph node status. However, 
they are incapable of achieving both high sensitivity and specificity simultaneously. In order to deal with this com- 
plex issue, a new evidential reasoning (ER) based model is proposed to support diagnosis of LNM in gastric cancer. 

Methods: There are 175 consecutive patients who went through multidetector computed tomography (MDCT) 
consecutively before the surgery. Eight indicators, which are serosal invasion, tumor classification, tumor 
enhancement pattern, tumor thickness, number of lymph nodes, maximum lymph node size, lymph node station 
and lymph node enhancement are utilized to evaluate the tumor and lymph node through CT images. All of the 
above indicators reflect the biological behavior of gastric cancer. An ER based model is constructed by taking the 
above indicators as input index. The output index determines whether LNM occurs for the patients, which is 
decided by the surgery and histopathology. A technique called k-fold cross-validation is used for training and test- 
ing the new model. The diagnostic capability of LNM is evaluated by receiver operating characteristic (ROC) curves. 
A Radiologist classifies LNM by adopting lymph node size for comparison. 

Results: 134 out of 175 cases are cases of LNM, and the remains are not. Eight indicators have statistically significant 
difference between the positive and negative groups. The sensitivity, specificity and AUC of the ER based model are 
88.41%, 77.57% and 0.813, respectively. However, for the radiologist evaluating LNM by maximum lymph node size, the 
corresponding values are only 63.4%, 75.6% and 0.757. Therefore, the proposed model can obtain better performance 
than the radiologist. Besides, the proposed model also outperforms other machine learning methods. 

Conclusions: According to the biological behavior information of gastric cancer, the ER based model can diagnose 
LNM effectively and preoperatively. 

Keywords: Gastric cancer, Lymph node metastasis, Evidential reasoning 



Background 

Gastric cancer has become one of the major causes of 
cancer-related deaths in the world [1]. Lymph node metas- 
tasis (LNM) is a very important prognostic factor regarding 
long-term survival [2]. The TNM a staging system based on 
American Joint Committee on Cancer is taken as the 
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evaluated standard and has been widely accepted [3]. Based 
on this standard, the 5-year survival rate of patients in NO 
stage after surgery is 86.1%, while Nl, N2 and N3 stage pa- 
tients can obtain 58.1%, 23.3% and 5.9%, respectively [4]. 

Currently, doctors diagnose LNM empirically based on 
the size of lymph nodes which relies on various imaging 
methods, such as endoscopic ultrasound (EUS), abdom- 
inal ultrasound, multi-slice spiral computerized tomog- 
raphy (CT), Magnetic Resonance Imaging (MRI) and 
Positron Emission computed Tomography (PET). How- 
ever, none of the above imaging tools can acquire the 
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lymph node status in a satisfactory way. Meanwhile, a 
systemic review shows that EUS, MDCT, conventional 
MRI, and FDG-PET cannot be used to confirm or ex- 
clude the presence of LNM reliably [2]. The reason is 
that large lymph nodes may be caused by inflammation, 
while small ones may be caused by metastasis. There- 
fore, single lymph node size is not a strong predictor. In 
fact, many studies have shown that LNM is related to 
tumor size, pathological lymphatic involvement, histo- 
logical type and other factors [5-8]. Therefore, a method 
which combines lymph node size with these factors 
should be considered. Furthermore, a few researches 
[9-11] have discussed the diagnostic capabilities of mor- 
phological characteristics in rectum cancer. According 
to these studies, the morphological characteristics in- 
cluding border contour and signal intensity of lymph 
nodes may partly improve the diagnostic ability of me- 
tastasis. However, these studies mainly focus on the MRI 
imaging in rectum cancer. For patients with gastric can- 
cer in clinical practice, abdomen CT is a more common 
used imaging modality than MRI examination. Hence, 
we consider building a model to diagnose LNM with 
multiple indicators. 

As there are qualitative and quantitative data in eight 
indicators, a method which can integrate these two types 
of data should be adopted. The ER approach was origin- 
ally proposed to deal with multiple attribute decision 
analysis problems that involve both qualitative and 
quantitative attributes under uncertainty [12]. The ker- 
nel is the ER algorithm which is developed on the basis 
of the decision theory and the Dempster-Shafer (D-S) 
theory of evidence [13,14]. As ER can integrate the 
qualitative information and quantitative data reasonably, 
it is applied. One of the aims in this paper is to analyze 
which indicators are related to the biological behavior of 
gastric cancer and construct a mathematical model to 
assess LNM preoperatively. 

Methods 

Patients 

In this experiment, 175 CT cases obtained from Peking 
University Cancer Hospital & Institute (Beijing, China 
P. R.) constitute the sample set. According to the inter- 
national treatment guideline of gastric cancer, CT is one 
of the most commonly used inspections [15]. However, 
other methods such as PET and EUS are used as selected 
check. These patients were administered preoperative 
contrast enhancement abdominal in the CT examina- 
tions and received the gastrectomy between April 2006 
and September 2008. This retrospective study was ap- 
proved by institutional review board (IRB). They were 
preoperatively examined with MDCT. Note that we have 
obtained the informed consent from all selected patients 
prior to the routine clinical course of CT examinations. 



There are 125 males and 50 females among these pa- 
tients, and their average age is 59.8 years. The details are 
shown in Table 1. 

Indicators 

There are eight indicators which were extracted by two 
radiologists, one with three years and another with eight 
years experience in abdominal CT. The eight indicators 
were measured and counted manually on MDCT images 
as follows: 

(1) Serosal invasion: Axial and MPR images are 
evaluated to determine the serosal invasion 
simultaneously. The entire thickening stomach wall 
abnormally enhances linear or reticular structures in 
the fatty layer surrounding the stomach indicated 
serosal invasion [16]. 

(2) Tumor classification: Early gastric cancer or 
Bormann classification of advanced cancer in MPR 
images is confirmed. 

(3) Tumor enhancement pattern: Tumor enhancement 
is divided into three patterns at portal phase of CT 
images, which are mucosal surface enhancement, 
homogeneous enhancement and heterogeneous 
enhancement. 

(4) Tumor thickness: The maximal thickness of tumor 
is measured at the axial CT images. 

(5) The number of lymph nodes: The number of the 
gastric regional lymph nodes with size larger than 
3 mm in MDCT images by groups is counted [17]. 
As the lymph nodes, which are smaller than 3 
mm, are too tiny to make them discernible, 

they are omitted. 

(6) Maximum lymph node size: The short axis of 
the largest lymph node detected in CT images is 
measured. 



Table 1 Patient characteristics 


Clinic pathological features 


Value 


Number of patients 


175 


Average age(y) 


59.8(30-85) 


Ratio of men to women 


125:50 


Histopathology 




Adenocarcinoma 


1 73(98.9%) 


Well differentiated 


6(3.4%) 


Moderately differentiated 


91(52%) 


Poorly differentiated 


76(43.5%) 


Small cell carcinoma 


2(1.1%) 


lymph node metastasis 




Positive 


134(76.6%) 


Negative 


41(23.4%) 
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(7) Lymph node station: The lymph node station with 
MDCT images based on the Japanese classification 
of gastric carcinoma is determined [17]. 

(8) Lymph node enhancement: It means CT attenuation 
value of lymph node, which is measured at the 
portal venous phase of CT image. 

In this paper, all the indicators are measured manu- 
ally. The number of lymph nodes is the amount of 
lymph nodes around the stomach. Maximum lymph 
node size and lymph node enhancement is extracted 
from the maximal lymph node. The objective is to pre- 
dict whether LNM occurs other than maximum lymph 
node has LNM. In other words, the object is to predict 
whether LNM occurs for each patient. The final result 
for LNM diagnosis is decided by the surgery and histo- 
pathology. The pathological result can definitely con- 
firm whether LNM occurs or not. We do not want to 
predict metastasis for each lymph node. The reason is 
that one-to-one lymph nodes correspondence with CT 
and pathology depends on very precise and excellent 
experience of radiologist. It is usually not consistent 
adequately for different radiologists, which may affect 
the prediction accuracy of the mathematical model. 
Therefore, we did not make the one-to-one corres- 
pondence for every lymph node. The details are de- 
scribed in Table 2. 

ER based model 

In this model, we represent every case by an over- 
complete dictionary whose elements are the training 
samples. If sufficient training samples are available from 
each class, it will be possible to represent the test 
sample. 

Assume that training samples are denoted by X = {X h 
X2,...,X p }e R mxn } where n is the number of training sam- 
ples, and m is the number of indicators. ye{l,2,...,p} is 
the label and p is the class index. T = [T 1 ,T 2 ...,T m ] T de- 
notes a test sample. The over-complete dictionary A is 
denoted as follows: 



Table 2 Description of eight indicators 



Patient data 



LNM(-) 



LNM(+) 



Patient number 
Measurement data 

Tumor thickness(mm) 

Maximum lymph node size(mm) 

The number of lymph nodes 

Lymph node enhancement 

Count data 
Tumor enhancement pattern 

Pattern 1 

Pattern 2 

Pattern 3 
Serosal invasion 

Yes 

No 

Tumor classification 

Early gastric cancer 

Borrmann I 

Borrmann II 

Borrmann III 

Borrmann IV 
Lymph nodes station 

Station 1 

Station 2 

Station 3 



41/175 

13.3 ±14.0 
6.5 ± 2.8 
7±4 
39.5 ± 58.5 



13/175 
26/1 75 
2/175 

15/175 
26/1 75 

9/175 
2/175 
3/175 
27/1 75 
0/175 

29/1 75 
12/175 
0/175 



134/175 

16.6 ±28.4 
10.0 ±5.5 

12 ±8 
62.5 ± 66.5 



6/175 
118/175 
10/175 

120/175 
14/175 

1/175 
0/175 
9/175 
121/175 
3/175 

44/1 75 
54/1 75 
36/1 75 



The value of the measurement data was measured manually, and the count 
data was the number of data. 
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(2) 



Here A consists of training samples and A m>n repre- 
sents every indicator in training samples. According to 
the limits of ER, the columns of A and T should be nor- 
malized firstly. Then each indicator in test sample is 
represented by A and corresponding coefficients w b i = 
1,2,. Then we utilize the ER analytic algorithm [13] 
as follows: 



(3) 



All the indicators in T can be represented by A and w b 
i = 1,2,... ,m using the ER approach. Assume that ER rep- 
resents the ER approach. Therefore, T is represented as 
follows: 
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T = ER(A, o) (4) 

where ueRn is the coefficient vector. However, it is not 
possible to guarantee the optimal solution and instead 
we replace it by the approximate solution provided in 
Equation (5): 

T~ER(A, co) (5) 

As the new case can be sufficiently represented by the 
training samples from the same class, we obtain the pre- 
diction by co. The objective is to minimize residual be- 
tween expectation output and observed output. We 
solve this problem by calculating the following l 2 
minimization problem: 

mm\\T-ER(A,a)\\l (6) 

where co is the parameter that needs to be optimized. 
The constrained condition is, 

n 

^C0; = l,C0;>(M=l,...,tf (7) 

i=l 

The commonly used single-objective optimization 
methods with constraint handling can be utilized. In this 
paper, FMINCON function in MATLAB toolbox is 
adopted. 

Then the residual rpj=l,...,n is calculated between y 
and each class in Tas Equation (8). The residual is calcu- 
lated in the / 2 -norm. For each class let Wj be the charac- 
teristic function that selects the coefficients associated 



with the ;th class. When using the coefficients associated 
with the ;th class, one can approximate the given test 
sample y as yj = ER(T,wj). 

rj(T) = \\T-ER(a>,Aj)\\l,j=l,...,p (8) 

Finally, we can classify T by assigning it to the object 
class that minimizes the residual as follows: 

identity (T) = argminr ; (r),y = 1, ...,/? (9) 

The prediction result is the same as the class with 
minimum residual. We summarize the procedures as 
follows: 

Step 1: Input dictionary A = [Ai^,...^^ and a 
new case TeR m . 

Step 2: Normalize the columns of A and y to have unit 
/ 2 -norm. 

Step 3: Solve the ^-minimization problem to obtain 
coas in Equation (6). 

Step 4: Calculate the residuals r/T) as Equation (8). 
Step 5: Output the final label for which r/T) is 
minimized. 

Results 

By univariate statistical analysis, it shows that all the in- 
dicators including serosal invasion, tumor classification, 
tumor enhancement pattern, tumor thickness, number 
of lymph nodes, maximum lymph node size, lymph 
nodes station and Lymph node enhancement are signifi- 
cant different between LNM positive and negative group 



^ 0.6- 
> 



0.2- 



o.o- 




Source of 
the Curve 

ER Model 

LN Size 

Reference 
Line 



0.0 



— r- 

1.0 



0.2 0.4 0.6 0.8 

1 - Specificity 

Figure 1 ROC curve between ER based model and radiologist. Receiver operating characteristic (ROC) curve for LNM with ER based model 
and the radiologist. The AUC of the proposed model is 0.829, while the radiologist is 0.757. 
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Table 3 AUC values of nine methods on training data 



Model 


Area 


Std 
error 


Asymptotic 
sig. 


Asymptotic 95% confidence interval 
Lower bounder Upper bounder 


ER based model 


0.948 


0.020 


0.000 


0.908 


0.988 


ANN 


0.798 


0.043 


0.000 


0.713 


0.882 


SVM linear 


0.944 


0.019 


0.000 


0.906 


0.981 


SVM Gaussian 


0.955 


0.018 


0.000 


0.920 


0.990 


SVM Ploynomial2 


0.94 


0.022 


0.000 


0.898 


0.983 


SVM PloynomiaB 


0.938 


0.022 


0.000 


0.894 


0.982 


SVM Ploynomial4 


0.941 


0.022 


0.000 


0.898 


0.983 


Logistic Regression 


0.888 


0.027 


0.000 


0.835 


0.940 



(P < 0.001). On the other hand, 5-fold cross validation is 
used in all the experiments [18]. 

Taking 7.7 mm as the best cut-off point of maximum 
lymph node size, the radiologist achieves an AUC of 
0.757 for evaluating LNM. In other words, if lymph node 
size is larger than 7.7 mm, it is considered that LNM oc- 
curs; otherwise, it does not occur. The sensitivity and 
specificity are only 63.4% and 75.6%. However, the 
sensitivity, specificity and AUC of the proposed model 
are 89.51%, 80%, and 0.829, respectively (Figure 1 and 
Table 3). According to statistical analysis, the ER based 
model performs significantly better (P < 0.05) than 
radiologists. 

On the other hand, there are many other machine 
learning methods that can be used for evaluating 
LNM. Some typical methods are used for comparisons, 



including Artificial Neural Network (ANN) [19,20], 
Support Vector Machine (SVM) and logistic regression. 
For SVM, a linear, a Gaussian and a polynomial kernel 
are tested, and LibSVM2.91 is used [21]. The linear 
SVM is named as SVM linear. For Gaussian kernel 
(named as SVM Gaussian), the regularization and kernel 
parameters are set as {2" 3 ,2" 2 ,...,2 10 }, and the highest rec- 
ognition rate is regarded as the output. For polynomial 
kernel, three degrees such as 2 (named as SVM polyno- 
mial), 3 (named as SVM polynomiaB), and 4 (named 
as SVM polynomial) are tested. The feed forward 
neural network in MATLAB toolbox is adopted for 
ANN which has a single hidden layer and the number of 
nodes is 5. Binary logistic regression in SPSS is used for 
logistic regression. 5-fold-cross validation is still used in 
all studies. Figure 2 shows the ROC and Table 3 shows 
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ER model 
SVM 

Gaussian 
SVM Linear 
SVM 

polynomial 
SVM 

polynomials 
SVM 

polynomial 

Logistic 
Regression 
Reference 
0 Line 



Figure 2 ROC curve among eight methods on training data. Figure 2 shows the ROC curve for six methods on training data. The AUC of 
SVM whose kernel function is Gaussian is the largest among eight methods. The ER based model is a slightly lower than Gaussian kernel. 
However, it is better than other six methods. 
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Source of 
Curve 

ANN 

ER model 
SVM 

Gaussian 
SVM linear 
SVM 

Polynomial 
SVM 
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SVM 

Polynomial 

Logistic 
Regression 

Reference 
Line 



0.0 0.2 0.4 0.6 0 



1 - Specificity 



Figure 3 ROC curve among eight methods on testing data. Figure 2 shows the ROC curve for six methods on testing data. The AUC of the 

ER based model is the largest among six methods. 



the sensitivity, specificity and AUC of these nine methods 
on training data, while Figure 3 and Table 4 show the re- 
sults on testing data. The experimental results show that 
ER based model can obtain better performance than other 
commonly used machine learning methods. 

Discussion 

LNM has a great effect on the surgical treatment of pa- 
tients with gastric cancer and is an important factor in 
prognosis. Currently, the standard for judging LNM 
mainly depends on morphological indicators and lymph 
node size is the dominant indicator. However, there are 
some different opinions. For example, Fukuya T et al. 



[22] showed CT attenuation and lymph node configur- 
ation could aid in diagnosis of malignant adenopathy. 
Deutsch SJ et al. [23] pointed out that location, size, 
density and contour are not helpful in distinguish benign 
from malignant lymphadenopathy. Therefore, the main 
constraint is that there is no unified criterion for evalu- 
ating LNM preoperatively. 

Several other methods are applied to evaluate LNM. 
The commonly used method is artificial neural network 
(ANN). For example, Bollschweiler et al applied a 
single-layer perception to predict LNM in gastric cancer 
and the accuracy is 79% [24]. However, it has some defi- 
ciencies: 1) ANN is always sensitive to initial parameters 



Table 4 AUC values of nine methods on testing data 


Model 


Sensitivity 


Specificity 


AUC 1 


P value 

(AUC compared with Radiologist) 


ER based model 


0.8951 


0.8 


0.829 ± 0.037 


P < 0.001 


ANN 


0.8453 


0.6429 


0.791 ±0.041 


P < 0.001 


SVM Linear 


0.8723 


0.6765 


0.78 ± 0.046 


P < 0.001 


SVM Gaussion 


0.7657 


0.6036 


0.688 ± 0.059 


P < 0.001 


SVM Ploynomial2 


0.8611 


0.6774 


0.781 ±0.048 


P < 0.001 


SVM PloynomiaB 


0.8662 


0.6667 


0.782 ± 0.048 


P < 0.001 


SVM Ploynomial4 


0.8633 


0.61 1 1 


0.769 ± 0.049 


P < 0.001 


Logistic Regression 


0.8552 


0.6667 


0.793 ± 0.039 


P < 0.001 


Radiologist 2 


0.634 


0.765 


0.757 ±0.042 





^he value of data was AUC + standard deviation. 

2 Taking 7.7 mm as the best cut-off point of maximum lymph node size, and if lymph node size larger than 7.7 mm, it is considered that LNM occurs; otherwise, it 
does not occur. 



Zhou et al. BMC Medical Informatics and Decision Making 2013, 13:123 
http://www.biomedcentral.com/1472-6947/13/123 



Page 7 of 7 



and needs to spend more time evaluating them. 2), it is 
prone to be over fitting. Besides this, other machine 
learning methods are also used for evaluating LNM. 
Compared with ANN, the new model is generated with- 
out assuming system structure or parameters a prior, 
thus no parameters needs to be initialized, which can 
help circumvent the bottleneck of estimating all initial 
parameters. Moreover, the ER based model can obtain a 
higher prediction performance among the presented 
methods. Therefore, it seems that ER based model is a 
suitable method for evaluating LNM. 

Conclusions 

By utilizing the biological behavior information of gastric 
cancer on CT images, the proposed ER based model can 
help effectively diagnose the LNM preoperatively. 

Endnote 

a The international normative TNM classification de- 
scribes the state of the tumor (T), the lymph nodes (N), 
and possible metastases (M). 
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